REFLORA Cleaning Workflow
This article walks through a practical case study using the barRoso package to clean and harmonize plant specimen records downloaded from the REFLORA Virtual Herbarium using the refloraR package.
Goal
Demonstrate a full cleaning pipeline for biodiversity records from REFLORA, covering:
- Programmatic data download with
refloraR - Collector and record number standardization
- Duplicate detection
- Output preparation for downstream analysis
Step 1: Install Required Packages
# install.packages("devtools")
devtools::install_github("DBOSlab/refloraR")
devtools::install_github("DBOSlab/barRoso")Step 2: Download Specimens from REFLORA
Use the refloraR package to retrieve specimen records for a given taxon and herbarium:
library(refloraR)
records <- reflora_records(taxon = "Fabaceae",
herbarium = "RB",
save = FALSE)Step 3: Run barRoso Standardization
library(barRoso)
cleaned <- barroso_std(records,
colname_recordedBy = "recordedBy",
colname_recordNumber = "recordNumber",
colname_country = "country",
colname_stateProvince = "stateProvince",
flag_duplicates = TRUE,
rm_duplicates = FALSE)Step 4: Explore Results
table(cleaned$duplicate)
head(cleaned[, c("recordedBy", "recordNumber", "duplicate")])Step 5: Save Cleaned Output
write.csv(cleaned, "reflora_cleaned.csv", row.names = FALSE)Summary
With this REFLORA case study, we demonstrated how to:
- Download REFLORA data programmatically with
refloraR - Clean and harmonize biodiversity records using
barRoso - Detect and flag duplicates across herbarium specimens
This workflow is reusable for any REFLORA-supported taxa and institutions. Explore additional tools like barroso_cat() to merge datasets from GBIF, JABOT, and speciesLink next.